Schibsted YAMS

How to build and maintain a thousands/req service with minimal dedication

Who are you?



Daniel Caballero

Devops/SRE Engineer @ Schibsted

Part time (Devops) lecturer @ La Salle University

So... I work

... I (some kinda) teach

... I (try to) program...

... I (would like to) rock...

... and I live

So... I value my time (a lot)

I really don't like to waste it



  • Resolving incidents
  • Reactive work
  • Repetitive work

Schibsgrñvahed..WHAT??

What is Schibsted?

And SPT?

It's about convergence through global solutions

What's behind global components / services?

You build it, you run it

Nothing new in the horizon probably for you

That means there's no ops/support/systems/devops team

{
    "format": "webp",
    "watermark": {
        "location": "north",
        "margin": "20px",
        "dimension": "20%"
    },
    "actions": [
        {
            "resize": {
                "width": 300,
                "fit": {
                    "type": "clip"
                }
            }
        }
    ],
    "quality": 90
}

Why not offline transformations?

  • Lots of (user) contents. Reprocessing hurts
  • Sites are dynamic by nature. Some of them do adapt the content to the device.

This may sound familiar to you...

CDNs able to transform contents on the fly:

  • As a native functionality...
  • Or through lambdas / edge computing

SaaS solutions:

Opensource solutions:

So...

Why did you invest time on that?

Why are you here?

Availability

Low latency

We are the owners of the backlog

Despite sometimes is not so useful...

Low costs

High usage

Does not require high maintenance

  • (Almost) No incidents.
  • New sites do not require high onboarding efforts.

We would be able to maintain this with half an engineer

We don't (usually) like to cut people in half, so let's say one engineer

But be careful: if you stop developing a service, you kill the service

  • Stops being competitive
  • It quickly becomes legacy
  • Disconnects from current business needs

So we try to convince the company it requires, at least, the focus of two engineers

But oncall rotations

Ok. Let's say 3-4. And we accept an extra project.

How did you achieved that?

Combination of...

Don't you see some similarity?

Team

Agile

Benefiting from other Sch services

Reusability of other colleagues code/components.

Big department portfolio:
  • AWS bootstrap
  • Vulnerability scans
  • TravisCI, Artifactory, Spinnaker

Collaboration + transparency mindset

Internal RFCs Consumers as contributors Internal opensource model (full visibility of Github repos)

Product

Actual need

Project was initiated as several sites realized they had a common problem

Limited scope

  • API as the point of interaction
  • No business logic. "Dumb" service
  • Almost no-functionality that is used by a single site or no-one

Tech

Everything as code

No space for "one time" actions.

  • Alerting configuration by code
  • Infrastructure
  • (most of the) Configuration

Good design/tech choices

(but not perfect / or the best, for sure)

  • Immutable pattern
  • AWS + Netflix stack + Microservices
  • libvips
  • Non-blocking services

Continuous Delivery

And capacity to incorporate everything to the pipeline.

Small deltas. Iterative deliveries. Low risk deployments.

Look forward, rather than investing lots of time in your rollback strategy

TODO: pipeline image

0-error target

Yeah, Google SRE book and error budgets...

... but helped us to understand, tune, and get the trust from Sch sites, avoiding major disruptions when big sites onboarded, and minimizing the chance of "unplanned / reactive" activities

We also rely in a "good enough" test suite (unit+integration+acceptance) with a good coverage of all API-functionality

  • New error conditions means new tests
  • If tests are green, almost (TM) no space for surprises

Obs+Troubleshooting toolkit

  • When shit happens, at least, let's minimize pain
  • Enables experimentation culture

And what did you do wrong?

The Refactor (TM)

Complete refactor. New platform in parallel to deliver a new version of the API

  • APIv0
  • APIv1

Microservices split

Domain driven design... coupling of some services

Nice solution... but

Why not docker/k8s?

  • Local tests
  • YAMS Portal/Frontend already there
  • Migration exercise

Why not a Service Mesh?

And Prometheus?

We may.

And it may be a good moment to consider opencensus.

Actual (& not so far) future

Extra compression

  • Currently libjpg-turbo
  • Good for performance, pretty decent results, but...
  • MozJPEG, api-compatible with libjpg
  • guetzli, from Google

Bringing the service closer to the business

  • Image uploader
  • Online image editor
  • Integration with data services
    • Automatic classification
    • Nudity detector
    • Car plate pixelation
  • More regions/cloud providers deployments

  • Video transcoding...

Actual transformation pipelines

More adoption?

Some major Marketplaces are not using the service, yet

Simulating dependencies failures

Hoverfly: similar in concept to the Simian Army from Netflix, but specialized in API degradations

Stress test as part of the pipeline

GCP as an accelerator of South America?

Before closing...

Are you going to opensource it?

  • Schibsted do support contribution to opensource projects
  • As well as releasing internal code
  • Problem: Not following a "contribute-first" approach
  • But already contributed to bimg, zuul, krakenD...

Are you going to offer this SaaS to other companies?

Latencymap

api noiser

Final reminder

Be Rx in the code...

But not in real life

TODO: firefighter image forbidden

Keep the heros in your comics

TODO: comic image

As, eventually

A CPU is quicker than you attending interrupts

Your company will eventually not pay just for hero-style engineers

Great thanks...

Sch*

And especially...

Edge colleagues

Other Qs?

dan . caba at google (dot)com

Your opinion is very important to me

  • Find my lecture on the schedule in the eventory app
  • Rate and comment my performance

Thanks for your feedback, I will know what to improve